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Multiple beamforming is realized by singular value decomposition of the channel matrix which is assumed 
to be known to both the transmitter and the receiver. Bit-Interleaved Coded Multiple Beamforming (BICMB) can 
achieve full diversity as long as the code rate R c and the number of employed subchannels S satisfy the condition 
R C S < 1. Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding (BICMB-CP), on the other 
hand, can achieve full diversity without the condition R C S < 1. However, the decoding complexity of BICMB-CP 
is much higher than BICMB. In this paper, a reduced complexity decoding technique, which is based on Sphere 
Decoding (SD), is proposed to reduce the complexity of Maximum Likelihood (ML) decoding for BICMB-CP. The 
decreased complexity decoding achieves several orders of magnitude reduction, in terms of the average number of 
real multiplications needed to acquire one precoded bit metric, not only with respect to conventional ML decoding, 
but also, with respect to conventional SD. 
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I. Introduction 

3eamforming is employed in a Multi-Input Multi-Output (MIMO) system to achieve spatial multiplex- 
ingH and thereby increase the data rate, or to enhance the performance, when channel state information is 
available at the transmitter [3J. A set of beamforming vectors is obtained by Singular Value Decomposition 
(SVD) which is optimal in terms of minimizing the average Bit Error Rate (BER) 01. 

It is known that an SVD subchannel with larger singular value provides greater diversity gain. Spatial 
multiplexing without channel coding results in the loss of the full diversity order Q. To overcome 
the diversity order degradation of multiple beamforming, Bit-Interleaved Coded Multiple Beamforming 
(BICMB) was proposed ((J), Q. BICMB can achieve the full diversity order offered by the channel as 
long as the code rate R c and the number of subchannels used S satisfy the condition R C S < 1 [H. 

Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding (BICMB-CP) converts a 
symbol into a precoded symbol and distributes it over subchannels [9]. The addition of the constellation 
precoder to BICMB, whose code rate R c is greater than 1/5, provides the full diversity when the 
subchannels for transmitting the precoded symbols are properly chosen. However, BICMB-CP causes 
increased decoding complexity compared to BICMB. 

In this paper, Sphere Decoding (SD) with initial radius acquired by Zero-Forcing Decision Feedback 
Equalization (ZF-DFE) is used to calculate bit metrics of precoded symbols. The initial radius calculated 
by ZF-DFE IfTOll , which is also the metric weight of the Baiba point [11], ensures no empty spheres. 
Based on SD, two techniques are applied to reduce the number of executions carried out by SD and the 
computational complexity of each SD execution, respectively. Conventional SD substantially reduces the 
complexity, in terms of the average number of real multiplications needed to acquire one precoded bit 
metric, compared with exhaustive search. With the techniques proposed in this paper, further reductions of 
orders of magnitude are achieved. The reduction becomes larger as the constellation precoder dimension 
and the constellation size increase. 

The remainder of this paper is organized as follows: In Section HH the description of BICMB-CP is 
given. In Section [Till a reduced complexity decoding technique for BICMB-CP is proposed. In Section 
ITVl complexity comparisons for different constellation precoder dimensions or modulation schemes are 
presented. Finally, a conclusion is provided in Section PVT . 

Notation: Let diag[Bi, • • • , Bp] stand for a block diagonal matrix with matrices Bi , • • • , Bp, and let 

'in this paper, the term "spatial multiplexing" is used to describe the number of spatial subchannels, as in (Q. Note that the term is 
different from "spatial multiplexing gain" defined in (2)- 
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diagf&x, ■ • • , bp) be a diagonal matrix with diagonal entries b±, ■ ■ ■ ,b P . The superscripts (-) H , (-) T , and 
(•) stand for conjugate transpose, transpose and binary complement, respectively. Let R + and C stand for 
the set of positive real numbers and complex numbers, respectively. Finally, let N t and N r stand for the 
number of transmit and receive antennas, respectively. 

II. BICMB-CP Overview 

Fig.[T]represents the structure of BICMB-CP. First, the convolutional encoder with code rate R c = k c /n c , 
possibly combined with a perforation matrix for a high rate punctured code [12J, generates the codeword 
c from the information bits. Then, the spatial interleaver distributes the coded bits into S < min(N t , N r ) 
streams, each of which is interleaved by an independent bit-wise interleaver it. The interleaved bits are 
modulated by Gray mapped square QAM onto the complex- valued symbol sequence X = [xi • • • ~x K ], 
where Xfc is an S x 1 complex- valued symbol vector at the k th time instant. It is assumed that each stream 
employs the same 2 M -QAM constellation, where M is the number of bits labeling a complex-valued 
scalar symbol. Let f C C of size |x| = 2 M denote the complex-valued signal set of the square QAM. 

The complex- valued symbol vector x^ is multiplied by the S x S precoder 0, which is defined as 

e p o 

e= (i) 
. Is - p . 

where & p is the PxP unitary constellation precoding matrix that precodes the first P modulated entries of 
x^.. The system is called Bit-Interleaved Coded Multiple Beamforming with Full Precoding (BICMB-FP) 
when all of the S modulated entries are precoded, otherwise it is called Bit-Interleaved Coded Multiple 
Beamforming with Partial Precoding (BICMB-PP). The symbol generated by is multiplied by T, which 
is an S x S permutation matrix, to map the precoded and non-precoded symbols onto the predetermined 
subchannels. Let us define b p = [b p (l) ■ ■ ■ b p (P)] as a vector whose element b p (u) is the subchannel on 
which the precoded symbols are transmitted, and ordered increasingly such that b p (u) < b p (v) for u < v. 
In the same way, b n = [b n (l) ■ ■ ■ b n (S — P)] is defined as an increasingly ordered vector whose element 
b n (u) is the subchannel which carries the non-precoded symbols. 

The MIMO channel H G C NrXNt is assumed to be quasi-static, Rayleigh, and flat fading, and perfectly 
known to both the transmitter and the receiver. Assume that the channel coefficients remain constant for 
a block of K symbols. The beamforming vectors are determined by the SVD of the MIMO channel, 
i.e., H = UAV^ where U and V are unitary matrices, and A is a diagonal matrix whose s th diagonal 



element, A s G M + , is a singular value of H in decreasing order. When S scalar symbols are transmitted 
at the same time, then the first S vectors of U and V are chosen to be used as beamforming matrices at 
the receiver and the transmitter, respectively. In Fig. Q3 Us and Vg denote the first S column vectors of 
U and V respectively. 

The spatial interleaver arranges the complex- valued symbol vector as x' fc = [(x^,) T '■ (x£) T ] T = [x ktbp {i) ■ ■ ■ x k ,b p (P) '■ 
%k,b n (i) ■ ■ ■ Xk,b n (s-P)] T \ where x k and x k are the modulated entries to be transmitted on the subchannels 
specified in b p and b n , respectively. Then, the S x 1 received complex- valued symbol vector at the k th 
time instant r k = [(r p k ) T : (r£) T ] T = [f k ,i ■ ■ ■ h,P : f k:P+1 ■ ■ ■ f k:S ] T is 

f k = T@±' k + fi fe , (2) 

where T = diag[r p , T n ] is a block diagonal matrix, with diagonal matrices T p = diag[A& p (i), ■ • • , K P (P)] 
and T n = diag[A fen( i), • ■ ■ , A 6n(5 _ P )], and h k = [(h p k ) T : (h k ) T } T = [n M • • ■ h kyP : h k)P+1 ■ ■ ■ h k)S } T is a 
complex-valued additive white Gaussian noise vector with zero mean and variance N = S/SNR. The 
channel matrix H is complex Gaussian with zero mean and unit variance, and to make the received 
Signal-to-Noise Ratio (SNR) SNR, the total transmitted power is scaled as S. The input-output relation 
in © is decomposed into two equations as 

(3) 

r fc = r n Xfe + rife. 

The location of the coded bit c k > within the complex- valued symbol sequence X is known as k' — > 
(k,l,i), where k, I, and i are the time instant in X, the symbol position in x fe , and the bit position on 
the label of the scalar symbol x' k t , respectively. Let xl denote a subset of x whose labels have b E {0, 1} 
in the i th bit position. By using the location information and the input-output relation in ©, the receiver 
calculates the Maximum Likelihood (ML) bit metrics for c k > as 

7 M (ffc,Cfe/)= min ||ffe-rex|| 2 , (4) 

x tc fe' 

where l l f is a subset of y s , defined as 

ffe 4 = {x = [xi ■ ■ • x s f : x s \ s =i e xl, and x s \ s # G x}- 



5 



In particular, the bit metrics, equivalent to © for partial precoding, are 



7 M (f fc ,c fc /) = 



I 



xe ^ c fe' 



min \r k ,i - \i>x 



if 1< I < P 



if P + 1 < / < S 



(5) 



where ip 1 ^ is a subset of y/\ defined as 



fy l = {x = [x! ■ ■ • x P } T : x v \ v= i e xl, and x v \ v ^ E x} 



and /' is an entry in b n , corresponding to the subchannel mapped by T. Finally, the ML decoder, which 
uses Viterbi decoding, makes decisions according to the rule 



Recall that / is the symbol position in x' fc . If P + 1 < I < S for ©, the complex-valued scalar symbol 
carrying the coded bit is non-precoded. The non-precoded bit metric is the same as BICMB and can 
be decoded with low complexity using the technique presented in [fT3l . 

If 1 < I < P for ©, the complex- valued scalar symbol carrying the coded bit cy is precoded. The 
computational complexity for the precoded bit metric is much higher than the non-precoded bit metric. 
Exhaustive search requires exponential complexity according to the modulation alphabet size and the 
dimension of the constellation precoder. The total number of lattice points needed to be searched is 
\x\ P _1 |Xc fc , I = m this section, techniques are focus on reducing the complexity of precoded bit 
metrics calculation. 

A. Calculating Precoded Bit Metrics By SD 

SD is used to reduce the complexity of exhaustive search by only searching lattice points inside a 
sphere with radius 5 [14] . Let G = T p @ p , then SD is employed to solve 




(6) 



k' 



III. Reduced Complexity Decoding for BICMB-CP 



7 M (r fc , cy) = min ||f£ - Gx 



(7) 



where Q C ft/,, and ||ff - Gx|| 2 < 5 2 . 
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The P-dimensional complex-valued input-output relation of the precoded part in © can be transformed 
into a 2P-dimensional real- valued problem [fT4ll : 

rl = Gxf + nl (8) 

where r k , G, x^, and are corresponding real- valued representations of r k , G, y? k , and n^, respectively. 
For square QAM where M is an even integer, the first and the remaining 4f bits of labels for the 2 M - 
QAM are generally Gray coded separately as two 2^-PAM constellations, and represent the real and the 
imaginary axes respectively. Assume that the same Gray coded mapping scheme is used for the the real 
and the imaginary axes. As a result, each element of belongs to a real-valued signal set x, and one 
bit in the label of x^ corresponds to c*/. The new position of cy in the label of needs to be acquired 
as k! — > (k, I, i), which means cy lies in the i th bit position of the label for the I th element of real-valued 
vector symbol x£. Let xl denote a subset of x whose labels have b G {0, 1} in the i th bit position. Define 
V>c£ C x 2P as 

= {x = [xi • • • x 2P } T : x v]v=i G xl and x v ^ G x}- 

Then © is rewritten as 

l l '\v k ,c k ,) =min||r^-Gx|| 2 (9) 

where Vt C if) 1 /,, and ||r| — Gx|| 2 < S 2 . By using the QR decomposition of G = QR, where R is an 
upper triangular matrix, and the matrix Q is unitary, © is rewritten as 

7 M (f fc ,c fe =min||f£-Rx|| 2 (10) 

where i p k = Q H r p k . 

SD can now be viewed as a pruning algorithm on a tree of depth 2P, whose branches correspond to 
elements drawn from the set x> except for branches of the layer u = I, which correspond to elements 
drawn from the set xl , ■ SD starts the search process from the root of the tree, and then searches down 
along branches until the total weight of a node exceeds the square of the sphere radius, 5 2 . At this point, 
the corresponding branch is pruned, and any path passing through that node is declared as improbable for 
a candidate solution. Then the algorithm backtracks, and proceeds down a different branch. Once a valid 
lattice point at the bottom level of the tree is found within the sphere, 5 2 is set to the newly-found point 



weight, thus reducing the search space for finding other candidate solutions. In the end, the candidate 
solution corresponding to the path from the root to the leaf which is inside the sphere with the lowest 
weight is picked, and the corresponding weight is set to be the bit metric value. If no candidate solution 
is found, the tree will be searched again with a larger initial radius. SD can achieve the same performance 
as exhaustive search. 

The node weight is calculated as lfT5l . Ifl6ll 

w{x {u) ) = w{x (u+1) ) + w pw {^ u) ) (11) 

with u>(x( 2P+1 )) = 0, w pw (x.( 2P+1 ^) = 0, and u = 2P, 2P — 1, • • • ,1, where denotes the partial vector 
symbol at layer u. The partial weight Wpw^x^) is written as 

Wpw {^) = \rl, a -Y,Ru,vX v \ 2 (12) 

v=u 

where r p k u is the u th element of r£, R U:V is the (u,v) th element of R, and x v is the v th element of x 6 ip J b '\ 

B. Acquiring Initial Radius By ZF-DFE 

The initial radius 5 should be chosen properly, so that it is not too small or too large. Too small an 
initial radius results in too many unsuccessful searches and thus increases complexity, while too large an 
initial radius results in too many lattice points to be searched. 

In this work, for = b where b G {0, 1}, ZF-DFE is used to acquire a estimated real-valued vector 
symbol x^,, which is also the Baiba point IfTTTl . Then the square of initial radius <5 2 , which guarantees no 
unsuccessful searches is calculated by 

5 2 = ||rl-Rx£|| 2 . (13) 

The estimated real- valued vector symbol x| is detected successively starting from x b k 2P until x\ x , where 
x\ u denotes the u th element of x^,. The decision rule on x b ku is 

argmm \r^ u - 2^ v = u +i R u,vX°^ v - Ru, u x\, u = l. 

The estimation of the symbols (fl4l) can be carried out recursively by rounding (or quantizing) to the 
nearest constellation element in \ or Xl- 
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C. Reducing Number of Executions in SD 

For the k th time instant, the precoded real-valued vector symbol carries MP bits. Since each bit 
generates two bit metrics for c k ' = and cy = 1, then 2MP precoded bit metrics in total need to be 
acquired. However, some precoded bit metrics have the same value, hence SD can be modified to be 
executed less than 2MP times, as mentioned in ifTTl . 

Define x fc , x^ fc/ , and 7 fe as 

x fc = arg min \\r p k - Rx|| 2 , (15) 

xex 2P 

x)j fc ' = arg min ||fj£ — Rx|| 2 , (16) 

and 

lk = ||r£-Rx fc || 2 , (17) 

respectively. Note that ^ ' 1 U -^i* = X 2P an d V'o' 1 ^ "^l* = Then 

7 fc = min {7 M (r fc , Cfc / = 0), j l '%r k , c k > = 1)}, (18) 

which means that, for the MP bits corresponding to x^, the smaller precoded bit metric for each bit of 
Cfc/ = and cy = 1 have the same value 7&. 

Let b\ E {0, 1} denotes the value of the i th bit in the label of x k z -, which is the I th element of x^. Then 

7 M (f fc ,c^ = 6()= 7 fc . (19) 

First, two bit metrics 7'' J (ffc, c k > = 0) and 7^(?fc, c k > = 1) for one of the MP bits corresponding to x^ 
and their related x^ fe/ are derived by SD. Then the x^ fc/ corresponding to the smaller bit metric is chosen 
to be x fe , and -y k is acquired by (TTSl . For each of the other MP — 1 bits, •y l,t (r k , cy = U) is acquired by 
(TT91 , and 7 Z ' l (? & , Cy = H) is calculated by SD. Consequently, the execution number of SD for one time 
instant is reduced from 2MP to MP + 1. 

D. Reducing Number of Operations in SD 

In our previous work [18], a technique was introduced to implement SD with low computational 
complexity, which achieves the same performance as exhaustive search. The technique in this paper can 
be employed to achieve substantial further complexity reduction for BICMB-CP. In this subsection, a brief 
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description of the technique is presented for reducing the number of real multiplications. 

Note that for one channel realization, both R and x are independent of time. In other words, to decode 
different received symbols for one channel realization, the only term in (fT2l) which depends on time is 
r v ku . Consequently, a check-table T is constructed to store all terms of R UyV x, where R UjV ^ and x E x> 
before starting the tree search procedure. Equations (fTD) and (fl"2l) imply that only one real multiplication 
is needed by using T instead of 2P — u + 2 for each node to calculate the node weight. As a result, the 
number of real multiplications can be significantly reduced. 

Note that x can be divided into two smaller sets xi with negative elements and X2 with positive elements. 
Any negative element in xi nas a positive element with the same absolute value in % 2 - Consequently, in 
order to build T, only terms of R UjV x, where R UtV ^ and x G Xi, need to be calculated and stored. 
Since the channel is assumed to be flat fading, only one T needs to be built in one burst. If the burst 
length is very long, its complexity can be neglected. 

In our previous work 03), 06), a new lattice representation was introduced. In this work, the same lattice 
representation is employed to © but with a new application. The structure of the lattice representation 
becomes advantageous after applying the QR decomposition to G. By doing so, and due to the special 
form of orthogonality between each pair of columns, all elements R U:U+ i for u = 1, 3, . . . , 2P — 1 in the 
upper triangular matrix R become zero. The locations of these zeros introduce orthogonality between the 
real and the imaginary parts of every detected symbol, which can be taken advantage of to reduce the 
computational complexity of SD. 

Based on this feature, SD is modified in the following way: once the tree is searched in layer u, where 
u is an odd number, partial weights of this node and all of its brother nodes are computed, temporally 
stored, and recycled when calculating partial node weights with the same grandparent node of layer u + 2 
but with different parent nodes of layer u + 1. By implementing the modification, further complexity 
reduction is achieved. 

IV. Simulation Results 

Since the P-dimensional complex-valued input-output relation of the precoded part in © can be viewed 
as a P-dimensional BICMB-FR BICMB-FP is considered to verify the proposed technique. Exhaustive 
Search (EXH), Conventional SD (CSD), and Proposed Smart Implementation (PSI) which combines 
Section|injC and Section|injD, are applied. The average number of real multiplications, the most expensive 
operations in terms of machine cycles, for acquiring one bit metric is calculated at different SNR. 
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Fig. |2] shows comparisons for 2 x 2 S = 2 R c = § BICMB-FP. For 4-QAM, the complexity of EXH 
is reduced by 0.4 and 0.5 orders of magnitude at low and high SNR respectively, by CSD. PSI yields 
larger reductions by 1.1 and 1.2 orders of magnitude at low and high SNR respectively. In the case of 
64-QAM, reductions between CSD and EXH are 1.5 and 2.1 orders of magnitude at low and high SNR 
respectively, while larger reductions of 2.6 and 3.0 are achieved by PSI. 

Similarly, Fig. |3] shows complexity comparisons for 4 x 4 S = 4 R c = ~ BICMB-FP. For 4-QAM, 
the complexity of EXH decreases by 1.3 and 1.5 orders of magnitude at low and high SNR respectively. 
PSI gives larger reductions by 2.3 orders of magnitude at low SNR, and 2.4 orders of magnitude at high 
SNR. For the 64-QAM case, reductions between EXH and CSD by 3.2 and 4.4 orders of magnitude are 
observed at low and high SNR respectively, while larger reductions by 4.4 and 5.4 are achieved by PSI. 

Simulation results show that CSD reduces the complexity substantially compared to EXH, and the 
complexity can be further reduced significantly by PSI. The reductions become larger as the constellation 
precoder dimension and the modulation alphabet size increase. One important property of our decod- 
ing technique needs to be emphasized is that the substantial complexity reduction achieved causes no 
performance degradation. 

V. Conclusion 

In this paper, a reduced complexity decoding scheme for BICMB-CP is presented. SD with initial 
radius calculated by ZF-DFE is used to acquire precoded bit metrics needed for the Viterbi decoder. SD 
can achieve the same performance as exhaustive search, and more importantly, achieves a substantial 
complexity reduction. Two techniques are applied to reduce both the number of executions and operations 
for SD substantially. Therefore, BICMB-CP can be considered as a practical application for MIMO systems 
requiring high throughput with the full diversity order. The reduced complexity decoding in this paper 
can be applied to any convolutional coded MIMO system. 
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Fig. 1. Structure of BICMB-CP. 
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Fig. 2. Average number of real multiplications vs. SNR for 2 x 2 S = 2 BICMB-FP. 
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Fig. 3. Average number of real multiplications vs. SNR for 4 x 4 S = 2 BICMB-FP. 



